108 results found.
Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic Chinese English
Availability:
From Owner
License:
LDC User Agreement for Non-Members
Size:
- MByte Production Status:
Existing-used
Use:
structural information
-
Paper title:CxGBERT: BERT meets Construction Grammar
-
Paper track:Long paper/
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Harish Tayyar Madabushi | OntoNotes Release 5.0 | /N |
Documentation:
https://catalog.ldc.upenn.edu/docs/LDC2013T19/
Written
Corpus,
Language Type:
Monolingual
Languages:
Chinese
Availability:
From Owner
License:
Size:
1068411 entries Production Status:
Newly created-finished
Use:
Machine Learning
-
Paper title:MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization
-
Paper track:Long/Resources and Evaluation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Canwen Xu | MATINF | /N |
Documentation:
None
Modality Independent
Corpus,
Language Type:
Bilingual
Languages:
Chinese English
Availability:
Freely Available
License:
CC0-1.0
Size:
1.01 GByte Production Status:
Newly created-finished
Use:
Knowledge Discovery/Representation
-
Paper title:MOOCCube: A Large-scale Data Repository for NLP Applications in MOOCs
-
Paper track:Short/NLP Applications
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Jifan Yu | MOOCCube | /N |
Documentation:
Yes, the doc have Chinese and English version, and is now publicly available.
Written
Corpus,
Language Type:
Multilingual
Languages:
Chinese English French Japanese Korean Russian
Availability:
Freely Available
License:
Size:
5000 sentences Production Status:
Newly created-in progress
Use:
Analysis of cross-linguistic morphosyntactic divergences
-
Paper title:Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences
-
Paper track:Long/Resources and Evaluation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Dmitry Nikolaev | Aligned sub-corpus of Parallel Universal Dependencies | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Chinese
Availability:
LDC
License:
LDC
Size:
None Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:Meta-Transfer Learning for Code-Switched Speech Recognition
-
Paper track:Short/Speech and Multimodality
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Genta Indra Winata | HKUST Mandarin Telephone Speech, Part 1 | /N |
Documentation:
None
Written
Ontology,
Language Type:
Monolingual
Languages:
Chinese English French Japanese
Availability:
Freely Available
License:
MIT
Size:
345.4 MByte Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
-
Paper title:Neighborhood Matching Network for Entity Alignment
-
Paper track:Long/Information Extraction
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yuting Wu | DBP15K | /N |
Documentation:
There is a publicly available English documentation.
Written
Corpus,
Language Type:
Monolingual
Languages:
Chinese
Availability:
Freely Available
License:
OpenSource
Size:
50 MByte Production Status:
Newly created-in progress
Use:
Dialogue
-
Paper title:Towards Conversational Recommendation over Multi-Type Dialogs
-
Paper track:Long/Dialogue and Interactive Systems
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Zeming Liu | DuRecDial | /N |
Documentation:
None
Written
Treebank,
Language Type:
Multilingual
Languages:
Chinese English French German Italian Japanese Russian Spanish
Availability:
Freely Available
License:
CreativeCommons
Size:
None Production Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
-
Paper track:Short/Machine Learning for NLP
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mozhi Zhang | Universal Dependencies | /N |
Documentation:
None
Written
Evaluation Data,
Language Type:
Multilingual
Languages:
Chinese English French German Italian Japanese Russian Spanish
Availability:
From NIST
License:
Size:
None Production Status:
Existing-used
Use:
Document Classification, Text categorisation
-
Paper title:Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
-
Paper track:Short/Machine Learning for NLP
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mozhi Zhang | Reuters RCV1/RCV2 Multilingual Corpus | /N |
Documentation:
None
Written
Corpus,
Language Type:
Bilingual
Languages:
Chinese English
Availability:
Freely Available
License:
BSD
Size:
6 GByte Production Status:
Existing-used
Use:
Summarisation
-
Paper title:Attend, Translate and Summarize: An Efficient Method for Neural Cross-Lingual Summarization
-
Paper track:Long/Summarization
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Junnan Zhu | NCLS-corpora | /N |
Documentation:
English documentation available at https://github.com/ZNLP/NCLS-Corpora




